googleAuthRsearchConsoleRgoogleAuthRgoogleAnalyticsRgoogleComputeEngineR (Cloudyr)bigQueryR (Cloudyr)googleCloudStorageR (Cloudyr)googleLanguageR (rOpenSci)Slack group to talk around the packages #googleAuthRverse
googleCloudVisionRgoogleKubernetesRA 40 mins talk at Google Next19 with lots of new things to try!
https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be
Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet
https://www.rocker-project.org/
FROM rocker/tidyverse:3.6.0
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleComputeEngineR \
googleAnalyticsR \
searchConsoleR \
googleCloudStorageR \
bigQueryR \
## install Github packages
&& installGithub.r MarkEdmondson1234/youtubeAnalyticsR \
## clean up
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \
Flexible No need to ask IT to install R places, just run docker run Cross cloud, future-proof(?)
Version controlled No worries that latest tidyverse update will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future
Continuous development with GitHub pushes
Pros
Probably run the same code with no changes needed
Easy to setup
Cons
Expensive
May be better to have data in database
3.75TB of RAM: $423 a day
library(googleComputeEngineR)
# this will cost a lot
bigmem <- gce_vm("big-mem",
template = "rstudio",
predefined_type = "n1-ultramem-160")Pros
Docker infrastructure
library(future)
Cons
Changes to your code for split-map-reduce
Write meta code to handle I/O data and code
Not applicable to some problems
New in googleComputeEngineR v0.3 (7th May)
library(googleComputeEngineR)
vms <- gce_vm_cluster()
#2019-03-29 23:24:54> # Creating cluster with these arguments:template = r-base,dynamic_image = rocker/r-parallel,wait = #FALSE,predefined_type = n1-standard-1
#2019-03-29 23:25:10> Operation running...
...
#2019-03-29 23:25:25> r-cluster-1 VM running
#2019-03-29 23:25:27> r-cluster-2 VM running
#2019-03-29 23:25:29> r-cluster-3 VM running
...
#2019-03-29 23:25:53> # Testing cluster:
r-cluster-1 ssh working
r-cluster-2 ssh working
r-cluster-3 ssh workinggoogleComputeEngineR has custom backend for future
# create cluster
vms <- gce_vm_cluster("r-vm", cluster_size = 3)
plan(cluster, workers = as.cluster(vms))
# get data
my_files <- list.files("myfolder")
my_data <- lapply(my_files, read.csv)
# forecast data in cluster
library(forecast)
cluster_f <- function(my_data, args = 4){
forecast(auto.arima(ts(my_data, frequency = args)))
}
result <- future_lapply(my_data, cluster_f, args = 4) Can multi-layer future loops (use each CPU within each VM)
Thanks for Grant McDermott for figuring optimal method (Issue #129)
3 VMs, 8 CPUs each = 24 threads
Clusters of VMs + Docker + Task controller = Kubernetes
Pros
Auto-scaling, task queues etc.
Scale to billions
Potentially cheaper
May already have cluster
Cons
Needs stateless, idempotent workflows
Message broker?
Minimum 3 VMs
Built on Cloud Build upon GitHub push:
FROM rocker/shiny
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN needed for your app
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleAnalyticsR
## assume shiny app is in build folder /shiny
COPY ./shiny/ /srv/shiny-server/myapp/
Built on Cloud Buid every GitHub push:
FROM trestletech/plumber
# copy your plumbed R script
COPY api.R /api.R
# default is to run the plumbed script
CMD ["api.R"]
Shiny App:
kubectl run shiny1 --image gcr.io/gcer-public/shiny-googleauthrdemo:latest --port 3838
kubectl expose deployment shiny1 --target-port=3838 --type=NodePort
R plumber API:
kubectl run my-plumber --image gcr.io/your-project/my-plumber --port 8000
kubectl expose deployment my-plumber --target-port=8000 --type=NodePort
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: r-ingress-nginx
spec:
rules:
- http:
paths:
- path: /gar/
# app deployed to /gar/shiny/
backend:
serviceName: shiny1
servicePort: 3838
curl 'http://mydomain/api/echo?msg="its alive!"'
#> "The message is: its alive!"